{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "08eb8b05",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "#  16\\.  分类模型评价方法  # \n",
    "\n",
    "##  16.1.  介绍  # \n",
    "\n",
    "前面的分类实验中，我们使用了准确率这一种方法对模型进行评价。实际上，分类模型的评价方法还有很多，本次实验将会了解其它常用方法以便于对分类模型评价有更全面的掌握。 \n",
    "\n",
    "##  16.2.  知识点  # \n",
    "\n",
    "  * 准确率 \n",
    "\n",
    "  * 查准率 \n",
    "\n",
    "  * 召回率 \n",
    "\n",
    "  * F1 值 \n",
    "\n",
    "  * ROC 曲线 \n",
    "\n",
    "分类模型的评价方法前面的实验中仅介绍了准确率这一种，那么接下来我们将全面了解分类模型常用的评价指标。为了更好地理解，这里将使用逻辑回归来建立信用卡持卡人风险分类预测模型。 \n",
    "\n",
    "##  16.3.  数据集介绍  # \n",
    "\n",
    "数据集为 CSV 文件，可以使用 Pandas 读取数据集并预览。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cfdd59e",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "wget -nc https://cdn.aibydoing.com/aibydoing/files/credit_risk_train.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7f630c2d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>BILL_1</th>\n",
       "      <th>BILL_2</th>\n",
       "      <th>BILL_3</th>\n",
       "      <th>BILL_4</th>\n",
       "      <th>BILL_5</th>\n",
       "      <th>BILL_6</th>\n",
       "      <th>AGE</th>\n",
       "      <th>SEX</th>\n",
       "      <th>EDUCATION</th>\n",
       "      <th>MARRIAGE</th>\n",
       "      <th>RISK</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>37</td>\n",
       "      <td>Female</td>\n",
       "      <td>Graduate School</td>\n",
       "      <td>Married</td>\n",
       "      <td>LOW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>8525</td>\n",
       "      <td>5141</td>\n",
       "      <td>5239</td>\n",
       "      <td>7911</td>\n",
       "      <td>17890</td>\n",
       "      <td>10000</td>\n",
       "      <td>25</td>\n",
       "      <td>Male</td>\n",
       "      <td>High School</td>\n",
       "      <td>Single</td>\n",
       "      <td>HIGH</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>628</td>\n",
       "      <td>662</td>\n",
       "      <td>596</td>\n",
       "      <td>630</td>\n",
       "      <td>664</td>\n",
       "      <td>598</td>\n",
       "      <td>39</td>\n",
       "      <td>Male</td>\n",
       "      <td>Graduate School</td>\n",
       "      <td>Married</td>\n",
       "      <td>HIGH</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4649</td>\n",
       "      <td>3964</td>\n",
       "      <td>3281</td>\n",
       "      <td>934</td>\n",
       "      <td>467</td>\n",
       "      <td>12871</td>\n",
       "      <td>41</td>\n",
       "      <td>Female</td>\n",
       "      <td>Graduate School</td>\n",
       "      <td>Single</td>\n",
       "      <td>HIGH</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>46300</td>\n",
       "      <td>10849</td>\n",
       "      <td>8857</td>\n",
       "      <td>9658</td>\n",
       "      <td>9359</td>\n",
       "      <td>9554</td>\n",
       "      <td>55</td>\n",
       "      <td>Female</td>\n",
       "      <td>High School</td>\n",
       "      <td>Married</td>\n",
       "      <td>HIGH</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   BILL_1  BILL_2  BILL_3  BILL_4  BILL_5  BILL_6  AGE     SEX  \\\n",
       "0       0       0       0       0       0       0   37  Female   \n",
       "1    8525    5141    5239    7911   17890   10000   25    Male   \n",
       "2     628     662     596     630     664     598   39    Male   \n",
       "3    4649    3964    3281     934     467   12871   41  Female   \n",
       "4   46300   10849    8857    9658    9359    9554   55  Female   \n",
       "\n",
       "         EDUCATION MARRIAGE  RISK  \n",
       "0  Graduate School  Married   LOW  \n",
       "1      High School   Single  HIGH  \n",
       "2  Graduate School  Married  HIGH  \n",
       "3  Graduate School   Single  HIGH  \n",
       "4      High School  Married  HIGH  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(\"credit_risk_train.csv\")  # 读取数据文件\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46359493",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "|  BILL_1  |  BILL_2  |  BILL_3  |  BILL_4  |  BILL_5  |  BILL_6  |  AGE  |  SEX  |  EDUCATION  |  MARRIAGE  |  RISK   \n",
    "---|---|---|---|---|---|---|---|---|---|---|---  \n",
    "0  |  0  |  0  |  0  |  0  |  0  |  0  |  37  |  Female  |  Graduate School  |  Married  |  LOW   \n",
    "1  |  8525  |  5141  |  5239  |  7911  |  17890  |  10000  |  25  |  Male  |  High School  |  Single  |  HIGH   \n",
    "2  |  628  |  662  |  596  |  630  |  664  |  598  |  39  |  Male  |  Graduate School  |  Married  |  HIGH   \n",
    "3  |  4649  |  3964  |  3281  |  934  |  467  |  12871  |  41  |  Female  |  Graduate School  |  Single  |  HIGH   \n",
    "4  |  46300  |  10849  |  8857  |  9658  |  9359  |  9554  |  55  |  Female  |  High School  |  Married  |  HIGH   \n",
    "  \n",
    "该数据集包含 10 列特征，以及一列类别标签。其中： \n",
    "\n",
    "  * 第 1～6 列为客户近期历史账单信息。（特征） \n",
    "\n",
    "  * 第 7 列为该客户年龄。（特征） \n",
    "\n",
    "  * 第 8 列为该客户性别。（特征） \n",
    "\n",
    "  * 第 9 列为该客户教育程度。（特征） \n",
    "\n",
    "  * 第 10 列为该客户婚姻状况。（特征） \n",
    "\n",
    "  * 第 11 列为客户持卡风险状况。（分类标签：LOW, HIGH） \n",
    "\n",
    "我们的目的，是利用该数据集训练一个信用卡持卡人风险预测模型，并对模型进行评价。首先，按照机器学习建模的流程，需要将数据集划分为训练集和测试集。 \n",
    "\n",
    "虽然上面的数据集看起来已经非常整洁，但是你会发现第 7，8，9 列的特征数据为类别型（Female / Male）。所以，这里我们在划分数据集的同时，会使用数据预处理中介绍过的独热编码将类别型特征转换为数值型特征。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8223cf9",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   BILL_1  BILL_2  BILL_3  BILL_4  BILL_5  BILL_6  AGE  SEX_Female  SEX_Male  \\\n",
      "0       0       0       0       0       0       0   37        True     False   \n",
      "1    8525    5141    5239    7911   17890   10000   25       False      True   \n",
      "2     628     662     596     630     664     598   39       False      True   \n",
      "3    4649    3964    3281     934     467   12871   41        True     False   \n",
      "4   46300   10849    8857    9658    9359    9554   55        True     False   \n",
      "\n",
      "   EDUCATION_Graduate School  EDUCATION_High School  EDUCATION_Others  \\\n",
      "0                       True                  False             False   \n",
      "1                      False                   True             False   \n",
      "2                       True                  False             False   \n",
      "3                       True                  False             False   \n",
      "4                      False                   True             False   \n",
      "\n",
      "   EDUCATION_University  MARRIAGE_Married  MARRIAGE_Others  MARRIAGE_Single  \n",
      "0                 False              True            False            False  \n",
      "1                 False             False            False             True  \n",
      "2                 False              True            False            False  \n",
      "3                 False             False            False             True  \n",
      "4                 False              True            False            False  \n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "((14000, 16), (6000, 16), (14000,), (6000,))"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.preprocessing import scale\n",
    "\n",
    "df.RISK = df.RISK.replace({\"LOW\": 0, \"HIGH\": 1})  # 将分类标签替换为数值，方便后面计算\n",
    "\n",
    "train_data = df.iloc[:, :-1]  # 特征数据列\n",
    "train_data = pd.get_dummies(train_data)  # 对特征数据进行独热编码\n",
    "train_data = scale(train_data)  # 规范化处理\n",
    "\n",
    "\n",
    "train_target = df[\"RISK\"]  # 目标数据列\n",
    "\n",
    "# 划分数据集，训练集占 70%，测试集占 30%\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    train_data, train_target, test_size=0.3, random_state=0\n",
    ")\n",
    "\n",
    "X_train.shape, X_test.shape, y_train.shape, y_test.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d0252b5",
   "metadata": {},
   "source": [
    "**----------以下是代码解释----------**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee003993",
   "metadata": {},
   "source": [
    "这段代码的作用是对信用卡风险数据集进行预处理和划分训练集、测试集，具体步骤如下：\n",
    "\n",
    "1. **标签数值化**  \n",
    "   ```python\n",
    "   df.RISK = df.RISK.replace({\"LOW\": 0, \"HIGH\": 1})\n",
    "   ```\n",
    "   将原本为字符串的风险标签（\"LOW\"、\"HIGH\"）转换为数值（0、1），便于后续建模。\n",
    "\n",
    "2. **特征处理**  \n",
    "   ```python\n",
    "   train_data = df.iloc[:, :-1]\n",
    "   train_data = pd.get_dummies(train_data)\n",
    "   print(train_data.head())\n",
    "   train_data = scale(train_data)\n",
    "   ```\n",
    "   - 取出除最后一列（RISK）外的所有特征数据。\n",
    "   - 对类别型特征（如性别、学历、婚姻）进行独热编码，转为数值型。\n",
    "   - 对所有特征进行标准化（均值为0，方差为1），消除量纲影响。\n",
    "\n",
    "3. **目标变量**  \n",
    "   ```python\n",
    "   train_target = df[\"RISK\"]\n",
    "   ```\n",
    "   取出标签列作为目标变量。\n",
    "\n",
    "4. **数据集划分**  \n",
    "   ```python\n",
    "   X_train, X_test, y_train, y_test = train_test_split(\n",
    "       train_data, train_target, test_size=0.3, random_state=0\n",
    "   )\n",
    "   ```\n",
    "   - 将数据集按7:3比例分为训练集和测试集，`random_state=0`保证结果可复现。\n",
    "\n",
    "5. **输出数据形状**  \n",
    "   ```python\n",
    "   X_train.shape, X_test.shape, y_train.shape, y_test.shape\n",
    "   ```\n",
    "   查看训练集和测试集的样本数和特征数。\n",
    "\n",
    "**总结**：  \n",
    "这段代码完成了数据的标签编码、特征独热编码、标准化处理，并将数据集划分为训练集和测试集，为后续模型训练和评估做准备。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df5b51e8",
   "metadata": {},
   "source": [
    "**----------以上是代码解释----------**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0f70995",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "接下来，我们使用 scikit-learn 建立逻辑回归分类模型。使用训练数据完成模型训练，并使用测试数据对模型进行评价。使用 scikit-learn 训练模型的过程非常简单，实例化相应模型的类之后，使用 ` fit()  ` 即可完成训练。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "5a094328",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-1 {\n",
       "  /* Definition of color scheme common for light and dark mode */\n",
       "  --sklearn-color-text: #000;\n",
       "  --sklearn-color-text-muted: #666;\n",
       "  --sklearn-color-line: gray;\n",
       "  /* Definition of color scheme for unfitted estimators */\n",
       "  --sklearn-color-unfitted-level-0: #fff5e6;\n",
       "  --sklearn-color-unfitted-level-1: #f6e4d2;\n",
       "  --sklearn-color-unfitted-level-2: #ffe0b3;\n",
       "  --sklearn-color-unfitted-level-3: chocolate;\n",
       "  /* Definition of color scheme for fitted estimators */\n",
       "  --sklearn-color-fitted-level-0: #f0f8ff;\n",
       "  --sklearn-color-fitted-level-1: #d4ebff;\n",
       "  --sklearn-color-fitted-level-2: #b3dbfd;\n",
       "  --sklearn-color-fitted-level-3: cornflowerblue;\n",
       "\n",
       "  /* Specific color for light theme */\n",
       "  --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
       "  --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
       "  --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
       "  --sklearn-color-icon: #696969;\n",
       "\n",
       "  @media (prefers-color-scheme: dark) {\n",
       "    /* Redefinition of color scheme for dark theme */\n",
       "    --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
       "    --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
       "    --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
       "    --sklearn-color-icon: #878787;\n",
       "  }\n",
       "}\n",
       "\n",
       "#sk-container-id-1 {\n",
       "  color: var(--sklearn-color-text);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 pre {\n",
       "  padding: 0;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-hidden--visually {\n",
       "  border: 0;\n",
       "  clip: rect(1px 1px 1px 1px);\n",
       "  clip: rect(1px, 1px, 1px, 1px);\n",
       "  height: 1px;\n",
       "  margin: -1px;\n",
       "  overflow: hidden;\n",
       "  padding: 0;\n",
       "  position: absolute;\n",
       "  width: 1px;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-dashed-wrapped {\n",
       "  border: 1px dashed var(--sklearn-color-line);\n",
       "  margin: 0 0.4em 0.5em 0.4em;\n",
       "  box-sizing: border-box;\n",
       "  padding-bottom: 0.4em;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-container {\n",
       "  /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
       "     but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
       "     so we also need the `!important` here to be able to override the\n",
       "     default hidden behavior on the sphinx rendered scikit-learn.org.\n",
       "     See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
       "  display: inline-block !important;\n",
       "  position: relative;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-text-repr-fallback {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       "div.sk-parallel-item,\n",
       "div.sk-serial,\n",
       "div.sk-item {\n",
       "  /* draw centered vertical line to link estimators */\n",
       "  background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
       "  background-size: 2px 100%;\n",
       "  background-repeat: no-repeat;\n",
       "  background-position: center center;\n",
       "}\n",
       "\n",
       "/* Parallel-specific style estimator block */\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item::after {\n",
       "  content: \"\";\n",
       "  width: 100%;\n",
       "  border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
       "  flex-grow: 1;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel {\n",
       "  display: flex;\n",
       "  align-items: stretch;\n",
       "  justify-content: center;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  position: relative;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:first-child::after {\n",
       "  align-self: flex-end;\n",
       "  width: 50%;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:last-child::after {\n",
       "  align-self: flex-start;\n",
       "  width: 50%;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:only-child::after {\n",
       "  width: 0;\n",
       "}\n",
       "\n",
       "/* Serial-specific style estimator block */\n",
       "\n",
       "#sk-container-id-1 div.sk-serial {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "  align-items: center;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  padding-right: 1em;\n",
       "  padding-left: 1em;\n",
       "}\n",
       "\n",
       "\n",
       "/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
       "clickable and can be expanded/collapsed.\n",
       "- Pipeline and ColumnTransformer use this feature and define the default style\n",
       "- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
       "*/\n",
       "\n",
       "/* Pipeline and ColumnTransformer style (default) */\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable {\n",
       "  /* Default theme specific background. It is overwritten whether we have a\n",
       "  specific estimator or a Pipeline/ColumnTransformer */\n",
       "  background-color: var(--sklearn-color-background);\n",
       "}\n",
       "\n",
       "/* Toggleable label */\n",
       "#sk-container-id-1 label.sk-toggleable__label {\n",
       "  cursor: pointer;\n",
       "  display: flex;\n",
       "  width: 100%;\n",
       "  margin-bottom: 0;\n",
       "  padding: 0.5em;\n",
       "  box-sizing: border-box;\n",
       "  text-align: center;\n",
       "  align-items: start;\n",
       "  justify-content: space-between;\n",
       "  gap: 0.5em;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 label.sk-toggleable__label .caption {\n",
       "  font-size: 0.6rem;\n",
       "  font-weight: lighter;\n",
       "  color: var(--sklearn-color-text-muted);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n",
       "  /* Arrow on the left of the label */\n",
       "  content: \"▸\";\n",
       "  float: left;\n",
       "  margin-right: 0.25em;\n",
       "  color: var(--sklearn-color-icon);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n",
       "  color: var(--sklearn-color-text);\n",
       "}\n",
       "\n",
       "/* Toggleable content - dropdown */\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content {\n",
       "  max-height: 0;\n",
       "  max-width: 0;\n",
       "  overflow: hidden;\n",
       "  text-align: left;\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content.fitted {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content pre {\n",
       "  margin: 0.2em;\n",
       "  border-radius: 0.25em;\n",
       "  color: var(--sklearn-color-text);\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
       "  /* Expand drop-down */\n",
       "  max-height: 200px;\n",
       "  max-width: 100%;\n",
       "  overflow: auto;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
       "  content: \"▾\";\n",
       "}\n",
       "\n",
       "/* Pipeline/ColumnTransformer-specific style */\n",
       "\n",
       "#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Estimator-specific style */\n",
       "\n",
       "/* Colorize estimator box */\n",
       "#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n",
       "#sk-container-id-1 div.sk-label label {\n",
       "  /* The background is the default theme color */\n",
       "  color: var(--sklearn-color-text-on-default-background);\n",
       "}\n",
       "\n",
       "/* On hover, darken the color of the background */\n",
       "#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "/* Label box, darken color on hover, fitted */\n",
       "#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Estimator label */\n",
       "\n",
       "#sk-container-id-1 div.sk-label label {\n",
       "  font-family: monospace;\n",
       "  font-weight: bold;\n",
       "  display: inline-block;\n",
       "  line-height: 1.2em;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label-container {\n",
       "  text-align: center;\n",
       "}\n",
       "\n",
       "/* Estimator-specific */\n",
       "#sk-container-id-1 div.sk-estimator {\n",
       "  font-family: monospace;\n",
       "  border: 1px dotted var(--sklearn-color-border-box);\n",
       "  border-radius: 0.25em;\n",
       "  box-sizing: border-box;\n",
       "  margin-bottom: 0.5em;\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "/* on hover */\n",
       "#sk-container-id-1 div.sk-estimator:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
       "\n",
       "/* Common style for \"i\" and \"?\" */\n",
       "\n",
       ".sk-estimator-doc-link,\n",
       "a:link.sk-estimator-doc-link,\n",
       "a:visited.sk-estimator-doc-link {\n",
       "  float: right;\n",
       "  font-size: smaller;\n",
       "  line-height: 1em;\n",
       "  font-family: monospace;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  border-radius: 1em;\n",
       "  height: 1em;\n",
       "  width: 1em;\n",
       "  text-decoration: none !important;\n",
       "  margin-left: 0.5em;\n",
       "  text-align: center;\n",
       "  /* unfitted */\n",
       "  border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-unfitted-level-1);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link.fitted,\n",
       "a:link.sk-estimator-doc-link.fitted,\n",
       "a:visited.sk-estimator-doc-link.fitted {\n",
       "  /* fitted */\n",
       "  border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-fitted-level-1);\n",
       "}\n",
       "\n",
       "/* On hover */\n",
       "div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
       ".sk-estimator-doc-link:hover,\n",
       "div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
       ".sk-estimator-doc-link:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
       ".sk-estimator-doc-link.fitted:hover,\n",
       "div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
       ".sk-estimator-doc-link.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "/* Span, style for the box shown on hovering the info icon */\n",
       ".sk-estimator-doc-link span {\n",
       "  display: none;\n",
       "  z-index: 9999;\n",
       "  position: relative;\n",
       "  font-weight: normal;\n",
       "  right: .2ex;\n",
       "  padding: .5ex;\n",
       "  margin: .5ex;\n",
       "  width: min-content;\n",
       "  min-width: 20ex;\n",
       "  max-width: 50ex;\n",
       "  color: var(--sklearn-color-text);\n",
       "  box-shadow: 2pt 2pt 4pt #999;\n",
       "  /* unfitted */\n",
       "  background: var(--sklearn-color-unfitted-level-0);\n",
       "  border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link.fitted span {\n",
       "  /* fitted */\n",
       "  background: var(--sklearn-color-fitted-level-0);\n",
       "  border: var(--sklearn-color-fitted-level-3);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link:hover span {\n",
       "  display: block;\n",
       "}\n",
       "\n",
       "/* \"?\"-specific style due to the `<a>` HTML tag */\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link {\n",
       "  float: right;\n",
       "  font-size: 1rem;\n",
       "  line-height: 1em;\n",
       "  font-family: monospace;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  border-radius: 1rem;\n",
       "  height: 1rem;\n",
       "  width: 1rem;\n",
       "  text-decoration: none;\n",
       "  /* unfitted */\n",
       "  color: var(--sklearn-color-unfitted-level-1);\n",
       "  border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link.fitted {\n",
       "  /* fitted */\n",
       "  border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-fitted-level-1);\n",
       "}\n",
       "\n",
       "/* On hover */\n",
       "#sk-container-id-1 a.estimator_doc_link:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-3);\n",
       "}\n",
       "</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>LogisticRegression()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow\"><div><div>LogisticRegression</div></div><div><a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.6/modules/generated/sklearn.linear_model.LogisticRegression.html\">?<span>Documentation for LogisticRegression</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></div></label><div class=\"sk-toggleable__content fitted\"><pre>LogisticRegression()</pre></div> </div></div></div></div>"
      ],
      "text/plain": [
       "LogisticRegression()"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "model = LogisticRegression(solver=\"lbfgs\")  # 定义逻辑回归模型\n",
    "model.fit(X_train, y_train)  # 使用训练数据完成模型训练"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5a7aab1",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "对于分类模型而言，我们一般会使用准确率来对模型进行评价。详细来讲，通过已经训练好的模型在测试集上进行预测，从而计算出预测的准确率。 \n",
    "\n",
    "##  16.4.  准确率 Accuracy  # \n",
    "\n",
    "信用卡风险预测模型中，目标值对应了两个类别，常被称之为二分类问题。在二分类问题中，我们常常会定义正类和负类，例如这里我们定 HIGH 为 正类，LOW 为负类（你也可以反向定义）。那么，下面就可以给出实际类别（行名）和预测类别（列名）的混淆矩阵。 \n",
    "\n",
    "信用风险  |  HIGH  |  LOW   \n",
    "---|---|---  \n",
    "HIGH  |  True Positive (TP)  |  False Negative (FN)   \n",
    "LOW  |  False Positive (FP)  |  True Negative (TN)   \n",
    "  \n",
    "上表详细来讲： \n",
    "\n",
    "  * TP：将正类预测为正类数 → 预测正确 \n",
    "\n",
    "  * TN：将负类预测为负类数 → 预测正确 \n",
    "\n",
    "  * FP：将负类预测为正类数 → 预测错误 \n",
    "\n",
    "  * FN：将正类预测为负类数 → 预测遗漏 \n",
    "\n",
    "根据该混淆矩阵，我们就可以给出分类模型的常用评价指标的计算方式了。 \n",
    "\n",
    "准确率及正确分类的测试样本个数占测试实例总样本个数的比例。那么，准确率（准确度）计算公式就为： \n",
    "\n",
    "$$ Accuracy = \\frac{TP+TN}{TP+TN+FP+FN} \\tag{1} $$ \n",
    "\n",
    "当然，为了下面实现方便我们也可以将分类准确率写成下面的形式： \n",
    "\n",
    "$$ acc=\\frac{\\sum_{i=1}^{N}I(\\bar{y_{i}}=y_{i})}{N} \\tag{2} $$ \n",
    "\n",
    "式中，  $N$  表示数据总条数，  $\\bar{y_{i}}$  表示第  $i$  条数据的种类预测值，  $y_{i}$  表示第  $i$  条数据的种类真实值，  $I$  同样是指示函数，表示  $\\bar{y_{i}}$  和  $y_{i}$  相同的个数。 \n",
    "\n",
    "首先，我们需要得到模型的预测结果。这一步也非常简单，只需要使用 ` predict  ` 方法即可。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "6ef15876",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 1, 1, ..., 1, 1, 1], shape=(6000,))"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_pred = model.predict(X_test)  # 输入测试集特征数据得到预测结果\n",
    "y_pred"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "9b696268",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "def get_accuracy(test_labels, pred_lables):\n",
    "    # 准确率计算公式，根据公式 2 实现\n",
    "    correct = np.sum(test_labels == pred_lables)  # 计算预测正确的数据个数\n",
    "    n = len(test_labels)  # 总测试集数据个数\n",
    "    acc = correct / n\n",
    "    return acc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "409c8b9d",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "现在，只需要输入真实标签 ` y_test  ` 和模型预测标签 ` y_pred  ` 即可。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "d3bee814",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(0.7678333333333334)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_accuracy(y_test, y_pred)  # 计算模型预测准确率"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1979f7f",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "你也可以直接使用准确率的 scikit-learn 计算方法： ` sklearn.metrics.accuracy_score(y_true,  y_pred)  ` 。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "19f24c10",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7678333333333334"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "accuracy_score(y_test, y_pred)  # 传入真实类别和预测类别"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a38bf6e",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "实际上，scikit-learn 建模时也可以直接使用 ` model.score()  ` 求得分类准确率： "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "639739df",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7678333333333334"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.score(X_test, y_test)  # 传入测试数据特征和类别"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7450ccf8",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "上面，我们就说完了常用的 3 种分类准确率计算方法。分类模型作为机器学习中最常遇到的建模问题，除了使用「准确率」对模型进行评价，实际上还有另外几种常用的性能评价指标，在此一并介绍。 \n",
    "\n",
    "##  16.5.  查准率 Precision  # \n",
    "\n",
    "查准率又称精确率，即正确分类的正例个数占分类为正例的实例个数的比例。 \n",
    "\n",
    "$$Precision = \\frac{TP}{TP+FP} \\tag{3}$$ \n",
    "\n",
    "查准率的 scikit-learn 计算方法： ` sklearn.metrics.precision_score(y_true,  y_pred)  ` 。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c204a108",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7678333333333334"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.metrics import precision_score\n",
    "\n",
    "precision_score(y_test, y_pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e553e9da",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "##  16.6.  召回率 Recall  # \n",
    "\n",
    "召回率又称查全率，即正确分类的正例个数占实际正例个数的比例。 \n",
    "\n",
    "$$Recall = \\frac{TP}{TP+FN} \\tag{4}$$ \n",
    "\n",
    "召回率的 scikit-learn 计算方法： ` sklearn.metrics.recall_score(y_true,  y_pred)  ` 。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "f9fbc04c",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.metrics import recall_score\n",
    "\n",
    "recall_score(y_test, y_pred)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d398a95",
   "metadata": {},
   "outputs": [],
   "source": [
    "1.0"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5aeb3de",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "##  16.7.  F1 值  # \n",
    "\n",
    "F1 值是查准率和召回率的加权平均数 \n",
    "\n",
    "$$F1 = \\frac{2*(Precision * Recall)}{Precision + Recall} \\tag{5}$$ \n",
    "\n",
    "F1 相当于精确率和召回率的综合评价指标，对衡量数据更有利，也比较常用。 \n",
    "\n",
    "F1 值的 scikit-learn 计算方法： ` sklearn.metrics.f1_score(y_true,  y_pred)  ` 。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "ed279728",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8686716319411709"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.metrics import f1_score\n",
    "\n",
    "f1_score(y_test, y_pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16b9d719",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "##  16.8.  ROC 曲线  # \n",
    "\n",
    "部分分类模型中（如：逻辑回归），通常会设定一个阈值，并规定大于该阈值为正类，小于则为负类。所以，当我们减小阀值时，将会有更多的样本被划分到正类。这样会提高正类的识别率，但同时也会使得更多的负类被错误识别为正类。 \n",
    "\n",
    "所以，ROC 曲线的目的在用形象化该变化过程，从而评价一个分类器好坏。 \n",
    "\n",
    "ROC 曲线中有两个指标，分别是 TPR 和 FPR，公式如下： \n",
    "\n",
    "$$TPR = \\frac{TP}{TP+FN} \\tag{6a}$$ \n",
    "\n",
    "$$FPR = \\frac{FP}{FP+TN} \\tag{6b}$$ \n",
    "\n",
    "其中，TPR 代表能将正例分对的概率（召回率），而 FPR 则代表将负例错分为正例的概率。 \n",
    "\n",
    "ROC 曲线中，我们将横轴定为 FPR，纵轴定为 TPR，从而可以直观看出 FPR 与 TPR 之间的关系。 \n",
    "\n",
    "![image](https://cdn.aibydoing.com/aibydoing/images/document-uid214893labid7506timestamp1540448403508.svg)\n",
    "\n",
    "[ 来源 ](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)\n",
    "\n",
    "那么： \n",
    "\n",
    "  * 当 FPR=0，TPR=0 时，意味着将每一个实例都预测为负例。 \n",
    "\n",
    "  * 当 FPR=1，TPR=1 时，意味着将每一个实例都预测为正例。 \n",
    "\n",
    "  * 当 FPR=0，TPR=1 时，意味着为最优分类器点。 \n",
    "\n",
    "那么，一个优秀分类器对应的 ROC 曲线应该尽量靠近左上角。当曲线越接近于 45 度对角线，则分类器效果越差。 \n",
    "\n",
    "ROC 曲线的 scikit-learn 计算方法： ` sklearn.metrics.roc_curve(y_true,  y_score)  ` 。 \n",
    "\n",
    "虽然使用 ROC 曲线来表示分类器好坏很直观，但人们往往更喜欢使用数值来评价分类器，此时就提出了 AUC 的概念。AUC 的全称为 Area Under Curve，意思是曲线下面积，即 ROC 曲线下面积。 \n",
    "\n",
    "  * $AUC=1$  ：完美分类器。 \n",
    "\n",
    "  * $0.5<AUC<1$  ：分类器优于随机猜测。 \n",
    "\n",
    "  * $AUC=0.5$  ：分类器和随机猜测的结果接近。 \n",
    "\n",
    "  * $AUC<0.5$  ：分类器比随机猜测的结果还差。 \n",
    "\n",
    "AUC 的 scikit-learn 计算方法： ` sklearn.metrics.auc(fpr,  tpr)  ` 。 \n",
    "\n",
    "下面，我们就绘制出本次预测结果的 ROC 曲线： "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7373c287",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "outputs": [],
   "source": [
    "from matplotlib import pyplot as plt\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "from sklearn.metrics import roc_curve\n",
    "from sklearn.metrics import auc\n",
    "\n",
    "y_score = model.decision_function(X_test)\n",
    "fpr, tpr, _ = roc_curve(y_test, y_score)\n",
    "roc_auc = auc(fpr, tpr)\n",
    "\n",
    "plt.plot(fpr, tpr, label=\"ROC curve (area = %0.2f)\" % roc_auc)\n",
    "plt.plot([0, 1], [0, 1], color=\"navy\", linestyle=\"--\")\n",
    "plt.xlabel(\"False Positive Rate\")\n",
    "plt.ylabel(\"True Positive Rate\")\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48831158",
   "metadata": {},
   "outputs": [],
   "source": [
    "<matplotlib.legend.Legend at 0x16a552d70>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9fc733f",
   "metadata": {},
   "source": [
    "[ ![../_images/9251f8bbdf75a62dbfca0164d0cd691d76f63eb3828f0c31523f7383f079f861.png](../_images/9251f8bbdf75a62dbfca0164d0cd691d76f63eb3828f0c31523f7383f079f861.png) ](../_images/9251f8bbdf75a62dbfca0164d0cd691d76f63eb3828f0c31523f7383f079f861.png)\n",
    "\n",
    "##  16.9.  总结  # \n",
    "\n",
    "本次实验介绍了分类预测后常用的几个评价指标，它们分别是：准确率，查准率，召回率，F1 值，ROC 曲线。对于这几种方法，接下来要多做练习并熟练掌握。 "
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  },
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
